bias analysis
Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks
Lin, Hehai, Liu, Hui, Cao, Shilei, Li, Jing, Li, Haoliang, Wang, Wenya
Numerous multimodal misinformation benchmarks exhibit bias toward specific modalities, allowing detectors to make predictions based solely on one modality. While previous research has quantified bias at the dataset level or manually identified spurious correlations between modalities and labels, these approaches lack meaningful insights at the sample level and struggle to scale to the vast amount of online information. In this paper, we investigate the design for automated recognition of modality bias at the sample level. Specifically, we propose three bias quantification methods based on theories/views of different levels of granularity: 1) a coarse-grained evaluation of modality benefit; 2) a medium-grained quantification of information flow; and 3) a fine-grained causality analysis. T o verify the effectiveness, we conduct a human evaluation on two popular benchmarks. Experimental results reveal three interesting findings that provide potential direction toward future research: 1) Ensembling multiple views is crucial for reliable automated analysis; 2) Automated analysis is prone to detector-induced fluctuations; and 3) Different views produce a higher agreement on modality-balanced samples but diverge on biased ones.
Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification
Udagawa, Takuma, Zhao, Yang, Kanayama, Hiroshi, Bhattacharjee, Bishwaranjan
Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.
Fairness at Every Intersection: Uncovering and Mitigating Intersectional Biases in Multimodal Clinical Predictions
Ramachandranpillai, Resmi, Sampath, Kishore, Mohammad, Ayaazuddin, Alikhani, Malihe
Biases in automated clinical decision-making using Electronic Healthcare Records (EHR) impose significant disparities in patient care and treatment outcomes. Conventional approaches have primarily focused on bias mitigation strategies stemming from single attributes, overlooking intersectional subgroups -- groups formed across various demographic intersections (such as race, gender, ethnicity, etc.). Rendering single-attribute mitigation strategies to intersectional subgroups becomes statistically irrelevant due to the varying distribution and bias patterns across these subgroups. The multimodal nature of EHR -- data from various sources such as combinations of text, time series, tabular, events, and images -- adds another layer of complexity as the influence on minority groups may fluctuate across modalities. In this paper, we take the initial steps to uncover potential intersectional biases in predictions by sourcing extensive multimodal datasets, MIMIC-Eye1 and MIMIC-IV ED, and propose mitigation at the intersectional subgroup level. We perform and benchmark downstream tasks and bias evaluation on the datasets by learning a unified text representation from multimodal sources, harnessing the enormous capabilities of the pre-trained clinical Language Models (LM), MedBERT, Clinical BERT, and Clinical BioBERT. Our findings indicate that the proposed sub-group-specific bias mitigation is robust across different datasets, subgroups, and embeddings, demonstrating effectiveness in addressing intersectional biases in multimodal settings.
Distributed Least Squares in Small Space via Sketching and Bias Reduction
Garg, Sachin, Tan, Kevin, Dereziลski, Michaล
Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.
Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
Moutakanni, Thรฉo, Bojanowski, Piotr, Chassagnon, Guillaume, Hudelot, Cรฉline, Joulin, Armand, LeCun, Yann, Muckley, Matthew, Oquab, Maxime, Revel, Marie-Pierre, Vakalopoulou, Maria
AI Foundation models are gaining traction in various applications, including medical fields like radiology. However, medical foundation models are often tested on limited tasks, leaving their generalisability and biases unexplored. We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays. We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation, and provide an in depth analysis of population, age and sex biases of our model. Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical workflows and interpreting X-rays holistically. With RayDINO and small task-specific adapters, we reach state-of-the-art results and improve generalization to unseen populations while mitigating bias, illustrating the true promise of foundation models: versatility and robustness.
Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation
Wang, Xi, Rahmani, Hossein A., Liu, Jiqun, Yilmaz, Emine
Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.
Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media
Guo, Yuting, Rajwal, Swati, Lakamana, Sahithi, Chiang, Chia-Chun, Menell, Paul C., Shahid, Adnan H., Chen, Yi-Chieh, Chhabra, Nikita, Chao, Wan-Ju, Chao, Chieh-Ju, Schwedt, Todd J., Banerjee, Imon, Sarker, Abeed
Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.
Race Bias Analysis of Bona Fide Errors in face anti-spoofing
Abduh, Latifah, Ivrissimtzis, Ioannis
Face recognition is the method of choice behind some of the most widely deployed biometric authentication systems, currently supporting a range of applications, from passport control at airports, to mobile phone or laptop login. A key weaknesses of the technology, preventing it from being employed in security sensitive applications in uncontrolled environments, as for example ATM machines for money withdrawal, is its vulnerability to presentation attacks, where imposters attempt to gain wrongful access by presenting in front of the system's camera a photo, or a video, or by wearing a mask resembling a registered person. As a solution to this problem, algorithms for presentation attack detection (PAD) are developed, that is, binary classifiers trained to distinguish between the bona fide samples coming from live subjects, and those coming from imposters. The large variety in the types of possible presentation attacks, and the large variation in the environmental conditions under which they might take place, make PAD a particularly challenging problem. However, the current state-of-the-art, utilising the power of deep learning, comprises classifiers with excellent accuracy rates, and a satisfactory generalisation power to at least a limited number of previously unseen attacks. Cross-database generalisation is still problematic, however, it is debatable if this is a real obstacle to the deployment of PAD algorithms in practical applications, since such algorithms as usually embedded in specific face recognition systems, with given camera specifications and configurations. Here, we deal with the problem of race bias in face anti-spoofing algorithms. It is a topic that has attracted considerably less research interest than accuracy and generalisation power, despite the fact that it raises ethical, legal, and regulatory considerations, which, by their own, can prevent adoption in specific applications. Addressing this gap, the aim of this paper is to provide a framework for studying the question: Does the classifier work equally well on people from all races?.
AWS announces SageMaker Clarify to help reduce bias in machine learning models โ TechCrunch
As companies rely increasingly on machine learning models to run their businesses, it's imperative to include anti-bias measures to ensure these models are not making false or misleading assumptions. Today at AWS re:Invent, AWS introduced Amazon SageMaker Clarify to help reduce bias in machine learning models. "We are launching Amazon SageMaker Clarify. And what that does is it allows you to have insight into your data and models throughout your machine learning lifecycle," Bratin Saha, Amazon VP and general manager of machine learning told TechCrunch. He says that it is designed to analyze the data for bias before you start data prep, so you can find these kinds of problems before you even start building your model.
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets
Yang, Jianing, Zhu, Yuying, Wang, Yongxin, Yi, Ruitao, Zadeh, Amir, Morency, Louis-Philippe
Question answering biases in video QA datasets can mislead multimodal model to overfit to QA artifacts and jeopardize the model's ability to generalize. Understanding how strong these QA biases are and where they come from helps the community measure progress more accurately and provide researchers insights to debug their models. In this paper, we analyze QA biases in popular video question answering datasets and discover pretrained language models can answer 37-48% questions correctly without using any multimodal context information, far exceeding the 20% random guess baseline for 5-choose-1 multiple-choice questions. Our ablation study shows biases can come from annotators and type of questions. Specifically, annotators that have been seen during training are better predicted by the model and reasoning, abstract questions incur more biases than factual, direct questions. We also show empirically that using annotator-non-overlapping train-test splits can reduce QA biases for video QA datasets.